The Demo Corner: Stretching Sprites by Pasi 'Albert' Ojala (po87553@cs.tut.fi)
        
(All timings are in PAL, principles will apply to NTSC too)

You might have heard that it is possible to expand sprites to more than
twice their original size. Imagine a sprite scroller with 6-times expanded
sprites. However, there is no need to expand all of them equally. Using
this technique, it is possible to make easy sinus effects and constantly
expanding and shrinking letters.

The VIC (video interface controller) may be fooled in many things. One of
them is the vertical expansion of sprites. If you clear the expand flag and
then set it back straight away, VIC will think it has only displayed the
first one of the expanded lines. If we do the trick again, VIC will continue
to display the same data again and again. But why does VIC behave like this ?


_Logic gates will tell the truth_

It is not really a bug, but a feature. The hardware design to implement the
vertical enlargement was just as simple as possible. Those, who do not care
about hardware should skip this part... The whole y-enlargement is handled
with five simple logical ports. Each sprite has an associated Set-Reset
flip-flop to tell whether to jump to the next sprite line (add three bytes
to the data counter) or not.

Let's call the state of the flip-flop Q and the inputs R (reset) and S (set).
The function of a SR flip-flop is quite simple: if R is one, Q goes to zero,
if S is one, Q goes to one. Otherwise the state of the flip-flop does not
change. In this case the flip-flop is Set, if either the Y-enlargement bit
is zero or the state of the flip-flop is zero at the end of a scan line. The
flip-flop is reset, if both the state and the Y-enlargement are ones at the
end of the line.

When you clear the bit in the vertical expansion register, the flip-flop will
be set regardless of the electron beam position on the scan line. If you
set the bit again before the end of the line, the flip-flop will be cleared
and VIC will be displaying the same sprite line again. In other words, VIC
will think that it is starting to display the second line of the expanded
sprite row. This way any of the lines in any of the sprites may be stretched
as wanted.

 .---- Current flipflop state (if one, enables add to sprite pointer)
 |  .---- Y-expansion bit.
 |  |  .--- End of line pulse (briefly one at end of line)
 |  |  |  .--- Next state (What state will become under these conditions)
 |  |  |  |
 0  0  0  1
 0  0  1  1
 0  1  0  no change
 0  1  1  1
 1  0  0  1             Clear $D017 -> flip-flop is set
 1  0  1  1
 1  1  0  no change     Set $D017   -> flip-flop resets at the end of line
 1  1  1  0

So, simply, at any time, if vertical expand is zero, the add enable is set
to one. At the end of the line - before adding - the state is cleared if
vertical expand is one.


_Even odder ?_

Something very weird happens when we clear the expansion bit right when VIC
is adding three to the sprite image counters. The values in the counters will
be increased only by two, and the data is then read from the wrong place.

Normally the display of a sprite ends when VIC has shown all of the 21
lines of the sprite (the counter will end up to $3f). If there has been a
counter mixup, $3f is not reached after 21 lines and VIC will go on counting
and will display the sprite again, now normally. If we fool the counter only
once, the counter value $3f is reached when the sprite is displayed twice.


_Fiddling_

I don't think the distorted counter effect can be used for anything, but
there is many things where the variable stretching could be used. When you
open the borders, you can be sure that there is a constant amount of time,
if you stretch the sprites to the whole lenght of the area. You may stretch
only the first and last lines, stretch the other lines by a constant or
using a table, or using a variable table or any of the combinations possible.


_A raster routine is a must_

Because you have to access the VIC registers on each line during the stretch,
you need some kind of routine which can do other kinds of tricks besides the
stretch. You can open the side borders and change the background color and
maybe you have to shift the screen (and the bad lines with it) downwards.
[See previous C=Hacking Issues for talk about raster interrupts.]

Look at the demo program. In the beginning of the raster routine there is
first some timing, then a loop that lasts exactly 46 clock cycles. It takes
exactly one scan line to execute. Inside the loop we first do the necassary
modifications to the vertical scroll register, then we change the background
color and then we open the side borders. And finally we handle the stretching
using the stretch data, where a zero-bit means that the corresponding sprite
will be stretched. A one-bit means that VIC is allowed to go to the next line
of the sprite data.


_Stretching takes time_

Besides showing the stretched sprites we need time to generate the stretching
data, unless of course, the stretch is constant. We have to have 20
one-bits for each sprite in our table. It is not feasible to determine the
state of each byte in the table, instead you clear the table and plot the
needed bits.

The routine is quite straightforward, but many optimizations may be applied
to make it faster. First we load Y with the stretch of the first line (the
y-coordinate of the data). Then we use it as an index to the table and plot
the right bit and increase Y with the expansion value. Then we do it again
until we have all of the 20 bits scattered to the table. The last sprite line
will then stretch until we stop the stretching, because the last line is
not allowed to be drawn.


_Speed is everything_

The calculation itself is easy, but optimizing the routine is not. If all
of the sprites are stretched equally (by integer amounts) and from the same
position, the routine is the fastest possible.  You can also have variable
and smooth stretch.  Smooth stretch uses other than integer expansion values
and thus also needs more processor time.  If each sprite has to be stretched
individually, you need much more time to do it.

The fastest routine I have ever written uses some serious selfmodification
tricks. There are also some other tricks to speed up the stretch, but they
are all secret ones.. :-)  Well, what the h*ck, I will include it anyway.
By the time you read this I have already made a faster routine..

You can speed up that routine (by 17%) by unrolling the inner loop, but you
have to use a different addressing mode for ORA (zero-page). You also need
to place some restrictions to the tables used.. If you unroll both loops,
you can get ~25% faster routine than the Fore!-version.


_Demo program_

I tried to collect all of the main principles of stretching and raster
routines to the demo program. I use the term "raster routine" when the
execution is tightly synchronized to the electron beam and to the screen
display. The program may be unclear in places, but I wanted to keep it as
short as possible. The routine opens the side borders, scrolls the screen
vertically, changes the background color and stretches the sprites.

The stretcher routine allows different y-position and amount of expansion
for each sprite. This routine uses 1/8 fractions to do the counting, and so
it is much too slow to use in a real demo.  VIC registers are initialized
from a table, instead of setting them separately. Interrupt position is one
line above the sprites. The program does not open the top or bottom borders.
(I usually use a NMI to open the vertical borders, so that I only need to
 use one raster-IRQ position.)

I tried to make a NTSC version, but I couldn't get it to synchronize.
There are also less cycles available so you can't stretch all of the sprites
individually in NTSC (with this routine that is..).

--------------------------------------------------------------------------
Fast-stretch from Megademo92 (part: Fore!)

SINPOS          Stretch sinus index
SINSPEED        Stretch sinus index speed
YSINPOS         Y-sinus index
YSINSPEED       Y-sinus index speed
MASK            Bit mask for passess (usually $01,$02,$04,$08,$10..)

YSINUS          Y-sinus table
STRETCH         Sprite line sizes   (LSB of the address must be 0)
SIZET           Sprite size/2 table (LSB of the address must be 0)
DATA            Stretch data table (cleared before this routine)

[xx] marks selfmodification. For example loop counter, bit mask and
index to the stretch and size data tables are stored straight in the
code.

0b90    lda #$06        ; Number of sprites-1 (here I used only 7 sprites)
0b92    sta $0b96
0b95    ldx #$[ff]      ; Load counter
0b97    clc             ; Clear carry for adc
0b98    lda SINPOS,x    ; Stretch sinus position
0b9b    sta $0bd1       ; Set low bytes of indices
0b9e    sta $0bb8
0ba1    adc SINSPEED,x  ; Add stretch sinus speed (carry is not set)
0ba4    and #$7f        ; Table is 128 bytes (twice)
0ba6    sta SINPOS,x    ; Save new sinus position
0ba9    lda YSINPOS,x   ; Get the Y sinus position
0bac    adc YSINSPEED,x ; Add Y sinus speed
0baf    sta YSINPOS,x   ; Save new Y sinus position
0bb2    tay             ; Position to index register
0bb3    lda YSINUS,y    ; Get Y-position from table (can be 256 bytes long)
0bb6    sec             ; adc either sets or clears carry, we have to set it
0bb7    sbc SIZET[1e]   ; Subtract size of the sprite/2 to get the sprite
0bba    clc             ;  to stretch from the middle.
0bbb    tay             ; MaxSize/2 < Y-sinus < AreaHeight-MaxSize/2
0bbc    lda MASK,x      ; Get the ora-mask for this pass
0bbf    sta $0bcb       ; Store mask
0bc2    sta $0bdb
0bc5    ldx #$13        ; 19 lines here + 1 after
0bc7    lda DATA,y      ; Load & ora-mask & store
0bca    ora #[$01]
0bcc    sta DATA,y
0bcf    tya
0bd0    adc STRETCH[1e],x ; Add the stretch from the table (carry is not set)
0bd3    tay
0bd4    dex             ; decrease counter
0bd5    bne $0bc7       ; Do the 19 lines
0bd7    lda DATA,y      ; Load & ora-mask & store the 20th line
0bda    ora #[$01]
0bdc    sta DATA,y
0bdf    dec $0b96       ; Next sprite(s)
0be2    bpl $0b95
0be4    rts

Timings:
-------
clear 128 bytes: 514  + 12 cycles       8.16 lines
7 passes       : 3820 + 12 cycles       60.6 lines = 8.66 lines/pass

The unrolled clear routine consists of one load (lda #$00) and 128
store instructions (sta $nnnn). 12 cycles are counted for JSR/RTS.

Stretching of 8 sprites would take slightly less than 80 lines, which is one
fourth of the total raster time. Displaying a 128-line high stretcher takes
about 130 lines (counting sprite setup and synchronization), scroller couple
of lines more. Total 212 lines leaves 100 lines (6300 cycles) free for other
activities in a PAL system. In a NTSC system you would have only 50 lines
left.


A simple basic routine to create the stretch data:
-------------------------------------------------
a=0:for f=0 to 127:a=a+Height*(2+sin(f*PI/64)):poke Table+f,a:
poke Table+f+128,a:a=a-int(a):next f

This will also handle the 'rounding'. Because of this we don't have to
handle fractions in the stretcher routine. The use of a table also gives the
opportunity to have a separate size for each sprite line. The table does
not need to be a sinus, it could have triangle or any other 'waveform' as
long as the minimum value in the table (sprite line size) is 1.


A basic routine to do the size/2 table:
--------------------------------------
a=0:for f=0 to 19:a=a+peek(Table+f):next f: rem get the size in position 0
for f=0 to 127:poke STable+f,a/2:a=a-peek(Table+f)+peek(Table+f+20):next f

--------------------------------------------------------------------------
_Stretcher program_

YSCROLL= $CF00 ; Vertical scroll table (moves bad lines)
STRETCH= $CF80 ; Stretch table
COLORS=  $CE80 ; Table for background colors
YCOORD=  $0380 ; Sprite y-positions (eight bytes)
HEIGHT=  $0388 ; Sprite stretches   (eight bytes)
YPOS=    52    ; Sprite y-coordinate
SPRCOL=  2     ; Sprite colors


*= $C000

        SEI             ; Disable interrupts
        LDA #$7F
        STA $DC0D       ; Disable timer interrupts
        LDA #<IRQ       ; Our own interrupt handler
        STA $0314
        LDA #>IRQ
        STA $0315
        LDX #$3E        ; We create a sprite to cassette buffer
LOOP    LDA SPRITE,X
        STA $0340,X
        DEX
        BPL LOOP
        LDX #7
LOOP2   LDA #$D         ; Set the sprite image pointers
        STA $07F8,X
        LDA #SPRCOL     ; Set sprite colors
        STA $D027,X
        DEX
        BPL LOOP2
        LDX #$26
LOOP3   LDA VIDEO,X     ; Init VIC
        STA $D000,X
        DEX
        BPL LOOP3
        LDX #$7F        ; Create the y-scroll table
LOOP4   TXA             ;  and clear the color table
        AND #$07
        ORA #$10        ; Non-blank screen
        STA YSCROLL,X
        LDA #$00
        STA COLORS,X
        DEX
        BPL LOOP4
        STA $3FFF
        LDX #23         ; Create a color table
LOOP5   LDA BACK,X
        STA COLORS+8,X
        STA COLORS+32,X
        STA COLORS+56,X
        STA COLORS+80,X
        STA COLORS+96,X
        DEX
        BPL LOOP5
        JSR CHANGE      ; Init sprite sizes and y-positions
        CLI             ; Enable interrupts
        RTS

IRQ     LDX #$01
        LDY #$08        ; 'normal' $D016
        NOP             ; Timing
        NOP
        NOP
        BIT $EA         ; (Add NOP's etc. for NTSC)
LOOP6   LDA YSCROLL-1,X ; Move the screen (bad lines)      5
        STA $D011                                          4
        LDA COLORS,X    ; Load the background color        4
        DEC $D016       ; Open the border                  6
        STA $D021       ; Set the background color         4
        STY $D016       ; Screen to normal                 4
        LDA STRETCH,X   ; Stretch the sprites              4
        STA $D017                                          4
        EOR #$FF                                           2
        STA $D017                                          4
                        ; (Add NOP for NTSC     +2)
        INX             ; Increase counter                 2
        BPL LOOP6       ; Loop 127 times                 + 3
                                                         ---
        LDA #1          ; Ack the raster interrupt       =46
        STA $D019                                        +17(sprites)
                                                         ---
        JSR DOSTRETCH   ; New stretch                    =63(whole)

        JMP $EA31

SPRITE  BYT 0,0,0,3,$FB,0,7,$7E          ; An Example sprite
        BYT 0,$35,$DF,0,$1D,$77,0,$B7
        BYT $5D,0,$BD,$83,$7E,$EF,1,$DE
        BYT $BB,1,$78,$AE,3,$70,$EB,0
        BYT 0,$BA,3,$60,$EE,3,$D8,$FB
        BYT 2,$F6,$FE,$83,$BD,$9F,$BA,0
        BYT $37,$EE,0,$3D,$FB,0,7,$7E
        BYT 0,3,$DF,0,0,0,0

VIDEO   BYT $E8,YPOS,$20,YPOS,$50,YPOS,$80,YPOS,$B0,YPOS
        BYT $E0,YPOS,$10.YPOS,$40,YPOS,$C1,$18,YPOS-1,0,0
        BYT $FF,8,$FF,$15,1,1,$FF,$FF,$FF,0,0,0,0,0,0,0,1,10
        ; Init values for VIC - sprites, interrupts, colors

BACK    BYT 0,$B,$C,$F,1,$F,$C,$B   ; Example color bars
        BYT 0,6,$E,$D,1,$D,$E,6
        BYT 0,9,2,$A,1,$A,2,9

DOSTRETCH
        LDX #31            ; Clear the table
        LDA #0             ; (Unrolling will help the speed,
LOOP7   STA STRETCH,X      ;  because STA nnnn,X is 5 cycles
        STA STRETCH+32,X   ;  and STA nnnx is only 4 cycles.)
        STA STRETCH+64,X
        STA STRETCH+96,X
        DEX
        BPL LOOP7
        STA REMAIND+1      ; Clear the remainder
        LDA #7
        STA COUNTER+1      ; Init counter for 8 loops
        LDA #$80
        STA MASK+1         ; First sprite 7, mask is $80
COUNTER LDX #$00           ; The argument is the counter
        LDY YCOORD,X       ; y-position
        LDA HEIGHT,X       ; Height of one line (5 bit integer part)
        STA ADD+1
        LDX #20            ; Handle 20 lines
LOOP8   LDA STRETCH+2,Y
MASK    ORA #$00
        STA STRETCH+2,Y    ; Set a one-bit
        STY YADD+1
REMAIND LDA #0
        AND #7             ; Previous remainder
ADD     ADC #0             ;  add to the height
        STA REMAIND+1      ; Save the new value
        LSR
        LSR
        LSR
        CLC                ; Take the integer part
YADD    ADC #0
        TAY                ; New value to y-register
        DEX
        BNE LOOP8
        LSR MASK+1         ; Use new mask
        DEC COUNTER+1      ; Next sprite
        BPL COUNTER

CHANGE  LDA #$00
        ASL                ; Sprite height changes with 2x speed
        AND #$3F
        TAY                ; 64 bytes long table
        INC CHANGE+1       ; Increase the counter
        LDX #7             ; Do eight sprites
LOOP9   LDA SINUS,Y
        LSR
        LSR
        CLC                ; Use the same sinus as y-data
        ADC #8
        STA HEIGHT,X       ; Sprite height will be from 1 to 3 lines
        TYA
        ADC #10            ; Next sprite enlargement will be 10 entries
        AND #$3F           ;  from this
        TAY
        DEX
        BPL LOOP9
        LDX #7
        LDA CHANGE+1
        AND #$3F
        TAY
LOOP10  LDA SINUS,Y        ; Y-position
        STA YCOORD,X
        TYA
        ADC #10            ; Next sprite position is 10 entries from this one
        AND #$3F
        TAY
        DEX
        BPL LOOP10
        RTS

SINUS   BYT $20,$23,$26,$29,$2C,$2F,$31,$34 ; A part of a sinus table
        BYT $36,$38,$3A,$3C,$3D,$3E,$3F,$3F
        BYT $3F,$3F,$3F,$3E,$3D,$3C,$3A,$38
        BYT $36,$34,$31,$2F,$2C,$29,$26,$23
        BYT $20,$1C,$19,$16,$13,$10,$E,$B
        BYT 9,7,5,3,2,1,0,0,0,0,0,1,2,3,5,7
        BYT 9,$B,$E,$10,$13,$16,$19,$1C

